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Abstract 

We analyze a distributed algorithm for estimation of scalar parameters belonging to nodes in a mobile network from noisy 
relative measurements. The motivation comes from the problem of clock skew and offset estimation for the purpose of time 
synchronization. The time variation of the network was modeled as a Markov chain. The estimates are shown to be mean 
square convergent under fairly weak assumptions on the Markov chain, as long as the union of the graphs is connected. 
Expressions for the asymptotic mean and correlation are also provided. The Markovian switching topology model of mobile 
networks is justified for certain node mobility models through empirically estimated conditional entropy measures. 
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1 Introduction 

We consider the problem of estimation of variables in a network of mobile nodes in which pairs of communicating 
nodes can obtain noisy measurement of the difference between the variables associated with them. Specifically, 
suppose the u-th node of a network has an associated node variable x u G R. If nodes u and v are neighbors at 
discrete time index k, then they can obtain a measurement Cu,v{k) where 

The problem is for each node to estimate its node variable from the relative measurements it collects over time, 
without requiring any centralized information processing or coordination. We assume that at least one node knows 
its variable. Otherwise the problem is indeterminate up to a constant. A node that knows its node variable is called 
a reference node. All nodes are allowed to be mobile, so that their neighbors may change with time. 



The problem of time synchronization (also called clock-synchronization) through clock skew and offset estimation 
falls into this category, and provides the main motivation for the study. Time synchronization in ad-hoc networks, 
especially in wireless sensor networks, has been a topic of intense study in recent years. The utility of data collected 
and transmitted by sensor nodes depend directly on the accuracy of the time-stamps. In TDMA based communication 
schemes, accurate time synchronization is required for the sensors to communicate with other sensors. Operation on 
a pre-scheduled sleep- wake cycle for energy conservation and lifetime maximization also requires accurate knowledge 
of a common global time. We refer the interested reader to the review papers [1,2,3] for more details on time 
synchronization. 
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The relationship between local clock time r u (t) of node u and global time t is usually modeled as T u (t) — a u t + /?„, 
where the scalars a u ,/3 u are called its skew and offset, respectively [1,3]. A node can determine the global time t 

from its local clock time by using the relationship i = (r„(t) — j3 u )/& u as long as it can obtain estimates a u ,$ u of 
the skew and offset of its local clock. Hence the problem is clock synchronization in a network can be alternatively 
posed as the problem of nodes estimating their skews and offsets. It is not possible for a node to measure its skew 
and offset directly. However, it is possible for a pair of neighboring nodes to measure the difference between their 
offsets and logarithm of skews by exchanging a number of time stamped messages. Existing protocols to perform 
so-called pairwise synchronization, such as [4,5,6], can be used to obtain such relative measurements. The details 
will be described in Section 2.1. The problem of clock offset and skew estimation can therefore be cast as a special 
case of the estimation from relative measurements described above. If an algorithm is available to solve the scalar 
node variable estimation problem, nodes can execute two copies of this algorithm in parallel to estimate both skew 
and offset. Therefore we only consider the scalar case. In the context of time synchronization, the existence of a 
reference node means that at least one node has access to the global time t. This is the case when at least one node 
is equipped with a GPS receiver, in which case that node has access to the UTC (Coordinated Universal Time). 
If no node has a GPS receiver, then one node has to be elected to be the reference so that it's local clock time is 
considered the global time that everyone has to synchronize to. 

1.1 Related work 

Time synchronization in sensor networks can be classified into pairwise synchronization and global synchronization 
methods. In pairwise synchronization, a pair of nodes try to synchronize their clocks to each other. In practice this 
is often achieved by one of the nodes estimating its relative offset and/or skew with respect to the other node, so 
that the local time of the other node serves as a reference [4,5,6,7]. Precise definitions of relative offset and relative 
skew are postponed till Section 2.1. In global synchronization, also called network-wide synchronization, all nodes 
synchronize themselves to a common time. 

A common approach for global synchronization in sensor networks is to first elect a root node and construct a 
spanning tree of the network with the root node being the "level 0" node. Every node thereafter synchronizes itself 
to a node of lower level (higher up in the hierarchy) by using a pairwise synchronization method. Examples of 
such spanning-tree based protocols include Timing-Sync Protocol for Sensor Networks (TPSN) [8] and Flooding 
Time Synchronization Protocol (FTSP) [9]. Change in the network topology due to node mobility or node failure 
requires recomputing the spanning tree and sometimes even re-election of the root node. This adds considerable 
communication overhead. The situation gets worse if nodes move rapidly. 

Recently, a number of fully distributed global synchronization algorithms have been proposed that do not need 
spanning tree computation. Distributed protocols are therefore more readily applicable to mobile networks than 
tree-based protocols. Among the distributed synchronization protocols proposed, some are based on estimation of 
the skew and/or offset of each clock with respect to a reference clock (called absolute time synchronization). The 
algorithms proposed in [10,11,12,13,14] belong to this category. Another class of protocols estimate a common global 
time that may not be related to the time of any clock in the network. The algorithms proposed in [15,16,17] belong 
to this category, which we call virtual time synchronization. 

1.2 Contribution 

In this paper we consider the problem of distributed estimation of skews and offsets with respect to a reference clock 
in a mobile network for global absolute time synchronization, where the network changes with time due to nodes' 
motion. The common thread among virtual time synchronization methods mentioned earlier is the use of consensus- 
type algorithms to construct virtual skew and offsets that every node agrees to. In many applications, absolute time 
synchronization is preferable over virtual time synchronization. This occurs when the user of the sensor network is 
interested in the time of an event that is measured in an absolute reference time, such as UTC provided by a GPS 
unit on a base station. Therefore, in this paper we consider only absolute time synchronization. 

We analyze an algorithm for estimating absolute skews and offsets from noisy pairwise relative measurements of 
skews and offsets, which is a slight modification of the algorithms proposed in [18,10,12]. Though the algorithm is 
adopted from these earlier papers, the analysis in those papers were limited to static networks. Thus, little is known 
about how such an algorithm will perform in a mobile network. 
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The main contribution is that we analyze the convergence of the algorithm when the network topology changes due 
to the motion of the nodes, as well as random communication failure. We model the resulting time-varying topology 
of the network as the state of a Markov chain. Techniques for the analysis of jump linear systems from [19] are used to 
study convergence of the algorithm. We show that under fairly weak assumptions on the Markov chain, the proposed 
algorithm is mean square convergent if and only if the union of the graphs that occur is connected. Mean square 
convergence means the expected value and the variance of the estimates obtained by each node converges to fixed 
values that do not depend on the initial conditions. When the relative measurements are unbiased, then limiting 
mean is the same as the true value of the variable, meaning the estimates obtained are asymptotically unbiased. 
Formulas for the limiting mean and variance are obtained by utilizing results from jump linear systems. 

The algorithm we analyze bears a close resemblance to consensus algorithms. In fact, the estimation error dynamics 
turns out to be a leader-follower consensus algorithm, where the leader states - corresponding to the estimation error 
of the reference nodes - are always 0. However, existing results from consensus cannot be directly used to analyze the 
scenario examined in this paper. Consensus literature almost always treats the problem where all nodes participates 
in the consensus algorithm, i.e., "leaderless consensus", while ours is a "leader- follower" consensus since the reference 
nodes error state stays at 0. One may expect analysis of this case would be easier, but that turns out to be not 
the case. Even though the literature on consensus is extensive, the topic of consensus with both time- varying graph 
topology and additive measurement noise is considered only in a limited number of papers, e.g. [20,21,22,23]. There 
are several differences between the consensus algorithms studied in [20,21,22,23] and the error dynamics examined 
in this paper, which preclude using their results to perform the analysis. These include requirement of symmetry or 
balance in graphs/matrices, preassignment of time- varying gains that must be synchronized among all nodes, etc. 
None of these restrictions are imposed in our analysis (see Remark 4 for more details). 

Another contribution of the paper is to provide justification for the Markovian switching topology for mobile net- 
works. The Markovian switching model has also been used extensively in studying consensus protocols in networks 
with dynamic topologies [24,25,26,21]. For a network of static nodes with link drops, the Markovian switching model 
arises naturally from Markovian link drop model. In mobile networks, though, the only case where we can prove 
that a mobile network evolves according to a Markov chain is when nodes move according to the so-called random 
walk mobility model [27]. Although the Markovian switching assumption facilitates analysis, this assumption re- 
quires justification for more complex motion models. We use a technique from [28] to check if the graph switching is 
Markovian if nodes according to the so-called Random Waypoint Mobility (RWP) model. The RWP model is one of 
the most widely used mobility models for ad- hoc mobile networks [27]. We show that the resulting graph switching 
process can indeed be approximated well by a (first order) Markovian switching model. 

A preliminary version of this paper was presented in [29]. Compared to that paper, we make several additional 
contributions. While the paper [29] provided only sufficient conditions for mean square convergence, here we provide 
both necessary and sufficient conditions. An assumption of symmetry of certain matrices were made in [29], which 
is removed in the present paper. 

The rest of the paper is organized as follows. Section 2 describes the connection between the problem of estimation 
from relative measurements and the problem of skew/offset estimation, and then states the problem precisely. 
Section 3 describes the proposed algorithm and states the main result (Theorem 2). It also discusses the relevance 
of the Markovian switching topology model. Section 4 is devoted to the proof of the theorem. Simulation studies are 
presented in Section 5. 

2 The estimation problem 

We consider the problem of estimating the scalar parameters (called node variables) x u , u = 1, . . . , rib, where nb is 
the number of nodes in the network that do not know their node variables. We assume that there are n r additional 
nodes that knows their node variables, where n r > 1. These define a node set V = {1, . . . , n}, where n = rib + n r is 
the total number of nodes. For later reference, we define 'Vb '■= {1, • • • , nb} and 1^ = {rib + 1, . . . , tij, + n r }, so that 

V = 1i U V r . Note that n = rib + n r . Time is measured by a discrete time-index k = 0,1, The mobile nodes 

define a time-varying undirected measurement graph Q{k) = (1/, £(fc)), where (u, v) <E £(fc) if and only if u and v 
can obtain a relative measurement of the form (1) during the time interval between the time indices k and k + 1. 
Specifically, for each (it, v) € £(fc), there is a measurement Cu,v(k) = x u — x v + e UjV (k) that is available to both u 
and v at time k. In practice, one of the two nodes computes this measurement from sensed information. We assume 
that if u computes the measurement £ Ui „, it then sends this measurement to v so that v also has access to the same 
measurement. We follow the convention that the relative measurement between u and v that is obtained by the node 
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u is always of x u — x v while that used by v is always of x v — x u . Since the same measurement is shared by a pair of 
neighboring nodes, if v receives the measurement C„ iM from u, then it converts the measurement to C, v . u by assigning 
(v,u{k) := — ( u ,v(k). We assume, without any loss of generality, that between a pair of nodes u and v, the node with 
the lower index obtains the relative measurement between them first, and then shares with the node with the higher 
index. 

The neighbors of u at fc, denoted by 7V„(fc), is the set of nodes that u has an edge with in the measurement graph 
G(k). We assume that if v G Af u {k), then u and v can also exchange information through wireless communication at 
time k. Therefore, if one prefers to think of a communication graph, we assume that it is the same as the measurement 
graph. 

The task is to estimate the node variables x u for u — 1, . . . , n by using the relative measurements ( UyV (k), (u, v) e 

that becomes available over time k = 0, 1, In addition, the algorithm has to be distributed in the sense that 

each node has to estimate its own variables, and at every time k, a node u can only exchange information with its 
neighbors N u {k). Note that the estimation problem is indeterminate unless n r > 0. 

2.1 Relation to skew and offset estimation 

To see the connection between skew/offset estimation and the problem of estimation from noisy relative measure- 
ments introduced in the previous section, we first discuss the notion of pairwise synchronization between a pair of 
neighboring nodes u and v. By exchanging a number of time-stamped messages, it is possible for node u to estimate 
the so-called relative skew a UyV and relative offset f3 UjV between itself and v, where 

r u (t) = a u . v T v (t) + /3 UtV . (2) 

That is, the parameters u u . v and j3 u ^ v relate the local time of u to the local time of v at the same global time t. 
A number of methods are available that allows pairwise synchronization between a node pair from time-stamped 
messages [4,5,30,13,6]. The parameters a u , v and f3 UiV are also referred to as the skew and offset of node u with respect 
to node v [7]. 

The relationship between the absolute skew and offset a u , /3 U , a v , j3 v and relative skew and offset a uv ,/3 uv is given 

by 

(%u,v • — ft u ^ v '.— (5 U (3 V . (3) 

This relationship is obtained by expressing the local time r„ (t) of node u at global time t in terms of the local time 
T„(t) at node v at the same time t by using (1): 

T u (t) = a u { ) +/3 U = — T v (t) +Pu- fi v — , 

CVy Oi v Ot v 

and comparing with (2). Suppose a node u obtains noisy estimates a U:V , f3 UjV of the parameters a UtV ,(3 UtV by using 
a pairwise synchronization protocol. 

(1) We model the noisy estimate of a UtV as 

a u . v = exp(e s uv )a u , v (4) 

where exp(-) is exponential function and e s u v is a random variable. If the estimation error is small, then e s uv is 
close to 0. Taking log, we get 

loga^ = \oga u . v + e s u v = \oga u - loga v + e s u v . (5) 

Eq. (5) can be rewritten as v = x s u — x% + e s uv , with the definitions v := logd M ^ and xf := loga u , which 
makes v a noisy relative measurement of the node variables x s u and x s v ; cf. (1). It is important to notice that 
is a measured quantity - since a u . v is measured - while the variables x s u ,x s v , which are logarithms of the 
skews, are unknown. 
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(2) Similarly, the noisy estimate f3 u v of f3 uv with random estimation error e° v can be written as 

fiu.v — @u,v ~\~ e u ,v Pu Pv ~f" ^U,V (6) 

where e° v :— f3 v (l — a u /a v ) + e° „. Again, (6) can be rewritten as £° „ = ir° — i° + e° „, with the definitions 
C« u : — /3«,t; an d X S : — /^u) which makes ^° „ a noisy relative measurements of the node variables x° u and x°; 
cf. (1). In this case the node variables are the clock offsets /3 u 's. The noise e° v in the offset measurement is in 
general biased even if the measurement of the relative offset (3 U>V is unbiased. 

This discussion shows that the estimates of the relative skew and the relative offset between a pair of neighboring 
nodes, which can be obtained by existing algorithms for pairwise synchronization, can be expressed as a noisy 
relative measurement of node variables by appropriate redefinitions. The node variables are log-skews and offsets. 
Once node u obtains estimates x s u and x° u of its two node variables x s u and x°, it can estimate its skew and offset as 
a u :— exp(x* ) and $ u :— x° u . Thus, the problem of estimating the skews and offsets of all the clocks in a network 
can be transformed to an estimation from relative measurements problem, where relative measurements are of the 
form (I). 

Remark 1 From this point on, we only consider the estimation problem involving scalar node variables. This entails 
no loss of generality since estimation of the two scalar variables, skew and offset, can be performed in parallel. 
In the skew estimation problem, log-skews take the role of node variables and log a UtV 's obtained from pairwise 
synchronization take the role of relative measurements. In the offset estimation problem, node variables are the 
offsets and relative measurements are the U<V 's obtained from pairwise synchronization. The assumption on the 
existence of the reference node is equivalent to at least one node knowing the global time. This can be achieved by 
either one or more nodes having access to GPS time, or by arbitrarily electing a node as a reference and choosing 
its local time as the global time. 

3 Algorithm and results 

3.1 Algorithm for distributed estimation from relative measurement 

The algorithm we consider is adopted from [10,12,18], with minor modification to make it applicable to time varying 
networks. Each node u maintains in its local memory an estimate x u (k) of its node variable x u £ R. Every node - 
except the reference nodes - iteratively updates its estimate as we'll describe now. The estimates can be initialized 
to arbitrary values. In executing the algorithm at iteration k, node u communicates with its current neighbors to 
obtain measurements Cu,v(k) and their current estimates x v (k), v <E Af u {k). Since obtaining measurements require 
exchanging time-stamped messages, the current estimates can be easily exchanged during the process of obtaining 
new measurements. Node u then updates its estimate according to 

^W+XL^w^-W ' Ue b (7) 

x u , u e lS r , 

where the weights w vu (k) and w uu (k) are arbitrary positive numbers. The update law is well-defined even at times 
when u has no neighbors. Nodes continue this iterative update unless they see little change in their local estimates, 
at which point they can stop updating. The update procedure in each node u £ li is specified in Algorithm 1. 

Each node u is allowed to vary its local weights w uv (k) with time and use distinct weights for distinct neighbors to 
account for the heterogeneity in measurement quality. Between two neighbors p, q of node u at time k, the relative 
measurement Cu,p(k) between u and p may have lower measurement error than the relative measurement C, u ,q{k) 
between u and q. This occurs, for example, if u and p were able to exchange more time stamped messages than u 
and q before computing the relative measurements [6,7]. In this case, node u should choose its local weights at k so 
that Wp U (k) > w qu (k). Due to the denominator in (7), it is only the ratios among the weights that matter, not their 
absolute values. 
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Algorithm 1 Distributed update at node u 



Initialize estimate x u (0) <E R and local iteration counter k = 
while u is performing iteration do 
if M u {k u ) ^0 then 
for v e Af u (k) do 

if u does not have ( u ,v{k) then 

1. u and w perform pairwise synchronization: u obtains ( u , v (k), v obtains ( v , u (k); 

2. u and v exchange their current estimates: u saves x v (k); v saves x u (k); 
else 

u does not communicate with v; 
end if 
end for 

u updates x u (k + 1) using (7); 
else 

x u (k + 1)4- x u (k); 
end if 
k=k + 1; 
end while 



3.1.1 Asynchronous implementation 



The description so far is in terms of a common global iteration index k. In practice, nodes do not have access to such 
a global index. Instead, each node keeps a local iteration index. After every increment of the local index, the node 
tries to collect a new set of relative measurements with respect to one or more of its neighbors within a pre-specified 
time interval. At the end of the time interval, whether it is able to get new measurements or not, it updates its 
estimate according to the update law (7) and increments its local iteration counter. Now the index k in (7) has to 
be interpreted as the local iteration index. The process then repeats. It follows from (7) that if a node is unable to 
gather new measurements from any neighbors, then its updated estimate is precisely the previous estimate. 

The global iteration index is useful to describe the algorithm from the point of view of an omniscient spectator. 
Let T the time interval, say, in seconds, between two successive increments of the global index k. The parameter 
T is arbitrary, as long as is small enough so that no node updates its local estimate more than once with the time 
interval T. In that case, one of only two events are possible for an arbitrary node u at the end of the time interval 
when the global counter is increased from k to k + 1: (i) u either increases its local index by one, or (ii) u does 
not increases its local index. If a node increases its local index, both the local and global indices increase by one. 
A node does not increase its local iteration index if it is not able to gather new measurements. In the omniscient 
spectator's view, the node's neighbor set is empty at this time index; so according to (7), the next estimate of the 
node's variable is the same as the previous one. Thus, a node's local asynchronous state update can be described in 
terms of the synchronous algorithm (7); the latter being more convenient for exposition. We therefore consider only 
the synchronous version in the sequel. 



3.2 Convergence analysis with Markovian switching 



In this paper we model the sequence of measurement graphs {G{k)}^ =0 that appear as time progresses as the 
realization of a (first order) Markov chain, whose state space G = {Gi, ■ ■ ■ ,Gn} is the set of graphs that can 
occur over time. The Markovian switching assumption on the graphs means that P(Q(k + 1) = Qi\Q(k) = Gj) = 
P(G(k + 1) = Gi\G(k) = G 3 ,G(k - 1) = Gt, ■ ■ ■ ,G(0) = G P ) where Gi,Gj,Ge, ...,G P G G and where P(-) denotes 
probability. We assume that the Markov chain is homogeneous, and denote the transition probability matrix of the 
chain by V, in which pij is the (i,j)-th entry of V. Further discussion on Markov modeling of graphs is postponed 
till Appendix A. 

Let e u (k) := x u (k) — x u be the estimation error at node u. Since ( u ,v{k) — x u — x v + e u ,v(k), the update law (7) can 
be rewritten as 



e u {k + 1) = < 



»(k)e„(fc)+X;, 



™„ u (fc)(e„(fc)+e„,„(fc)) 



for u E % 



for u e V r . 



(8) 
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The right hand side of (8) is a weighted average of estimation errors of x u and measurement noise. If the measurement 
noise e UyV (k) is zero-mean and the initial estimates are unbiased, i.e. E[e u (0)] = 0, Vw G then E[e„(fc)] = for 
all k, where E[-] denotes expectation. 

The main result of the paper - on the mean square convergence of (8) - is stated below as a theorem. In the statement 
of theorem, e(fc) := [ei(fc), . . . ,e nb (k)] T is the estimation error vector. Moreover, fi(k) := E[e(fc)] is the mean and 
Q(fc) := E[e(fc)e(fc) ] is the correlation matrix of the estimation error vector. We say that a stochastic process y(fc) 
is mean square convergent if E[y(fc)] and E[y(fc)y T (fc)] converges as k — > oo for every initial condition. The union 
graph Q is defined as follows: 

G:=^ y Qi = (y^y ( E i \ (9) 

where is set of edges in Qi. We assume that the measurement noise e UyV (k) affecting the measurements on the edge 
(u, v) is a wide sense stationary process. We also assume that the measurement noise sequence t u ,v{k) and the initial 
condition x u (0), for any u, v, k is independent of the Markov chain that governs the time- variation of the graph. 

Due to technical reasons, we make an additional assumption that there exists a time kg after which the edge- weights 
do not change. The choice of weights during the transient period (up to fco) will affect initial reduction of the 
estimation errors but will not change the asymptotic behavior. 

Recall that e(fc) is the estimation error vector for the nodes who do not know their node variables, the main theorem 
is as follows: 

Theorem 2 Assume that the temporal evolution of the communication graph Q{k) is governed by an N -state ho- 
mogeneous Markov chain that is ergodic, and pa > for i = 1, . . . , N. The estimation error e(k) is mean square 
convergent if and only if Q is connected. 

Remark 3 The formulas for computing the limiting values lim / t_ ! . 00 and lim / ! _ ! . oo Q(fc) are provided in Lemma 6 
(Section 4)- It follows directly from the formulas (see Lemma 6), that if additionally all the measurements are 
unbiased, then lim^oo /x(fc) = 0. 

The implication of the theorem is that as long as nodes are connected in a "time-average" sense characterized by 
Q being connected, the estimates of the node variables will converge to random variables with a constant mean 
and variance, irrespective of the initial conditions. Thus, after a sufficiently long time, the nodes can turn off 
the synchronization updates without much loss of accuracy. The assumption of ergodicity of the Markov chain 
ensures that there is an unique steady state distribution and that the steady state probability of each state is 
non-zero [19]. This means every graph in the state space of the chain occurs infinitely often. Since their union 
graph is connected, ergodicity implies that information from the reference node(s) will flow to each of the nodes 
over time. None of the graphs that ever occur is required to be a connected graph. The assumption pa > means 
P(Q(k + 1) = Qi\Q(k) = Qi) > 0. This can be assured if the nodes move slowly enough. 

Remark 4 [Relation to consensus] The estimation error dynamics (8) can be interpreted as a "leader- following" 
consensus algorithm, where the state of node u is the estimation error e u (k) for u 6 14>, while the leader states are 
e r (k) = for r € 1v. Although the literature on consensus is extensive, the topic of consensus with time-varying 
graph topology and additive measurement noise is considered only in a limited number of papers, with [20,21,22,23] 
representing the state of the art in this topic. There are significant differences between the algorithm we analyze 
and those in [21,22], as well as between the results. First, the cited references deal with the leaderless consensus 
while our situation is that of a leader-following one. Second, the algorithms in [21,22] require that the nodes use 
a specifically designed time-varying weight sequence that satisfy a certain persistence condition: : they decay to 
while being square summable but not absolutely summable. That is {u> Ui „(fc)}J° £ ^2,^ (-\ for each pair (u,v). This 
condition is difficult to ensure unless the nodes have synchronized clocks to begin with. In contrast, we allow the nodes 
to vary their weights with time arbitrarily subject only to the condition that they stop doing so at a certain time. 
Furthermore, the results in [20,21,22] are established under the assumption the weighted Laplacian matriceP~\ of the 
directed graphs are balanced. Ensuring balanced weights require coordination between pairs of neighbors. In contrast, 



1 The Laplacian matrix of Qi is equal to Mi — Ni in this paper, where the definition of matrices Mi and Ni are given in 
Section 4. 
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we do not impose any kind of symmetry on the Laplacian matrices, so that each node can choose its weights without 
coordinating with its neighbors. Not imposing symmetry makes the analysis significantly more difficult. 



4 Proof of Theorem 2 



We consider a weighted directed graph Q(k) = ( ( l/,'E(k),W(k)) associated with undirected measurement graph 
Q{k) — ', £(&)). In particular, there exists an undirected edge (u,v) in Q(k), then there exist two directed edges 
(u,v) and (v,u) in Q{k). The weight matrix W{k) defined as 



W uv (k) := { 



w vu (k) > for (u,v) G T,{k) 
Wuu(k) > for v = u 
o.w. 



(10) 



Thus, given a measurement graph Q(k) and W(k), Q{k) is specified. See Figure 1 for an example of an undirected 
measurement graph and an associated directed weighted graph. The square non-negative matrices, D(k), M(k) 
and N(k) is defined as follows: D(k) is a n x n diagonal matrix made up of the diagonal entries of W(k), and 
N(k) := W(k) — D(k). M{k) is a n x n diagonal matrix with entry M uu (k) — J2u^v W uv (k). Furthermore, we define 
the m x nb basis matrix Df,(k), Mb{k) and Nb(k) as the principle submatrix of D{k), M(k) and N(k) obtained by 
removing those rows and columns corresponding to the reference nodes. Now, (8) can be compactly expressed as 

e(fc + l) = Jb(k)e(k) + B b (k)e(k), (11) 

where 

J b (k) := (M b (k) + Dbik^-HNbik) + Db(k)), 
B b {k) := (Mb(k) + D b {k))- l A b {k), 

e(fc):=[e 1 (fc) r ,...,e„ fc (fe) r ] T , (12) 
e u (k) := [e u ,i(fc), . . . e„,„(fc)] T ', 
A b {k) :^diag(N 1 (k),...,N nb (k)), 
N u (k) := [N ul (k),...,N un (k)l 



where vector e u (k) and N u (k) do not contain e UiU (k) and N uu (k) respectively. Note that diagonal matrix Mb{k) + 
Db{k) is always non-singular because diagonal entries in Db{k) are always positive as W uu (k) > for all u, and 
Mb(k) is nonnegative. When N uv (k) = 0, the corresponding e UyV {k) is taken to be an arbitrary random variable with 
mean and variance such that the stationary assumption is satisfied. Since these noise terms are multiplied by 0, this 
entails no loss of generality. Moreover, recall that e UiV (k) = —e ViU (k). 

As a result of the assumption that there exists a time kg after which the weight between two nodes do not change, 
the graph Q(k) uniquely determines the weight matrix W(k) for k > fco- Since there are N distinct graphs in G, 
a set W := {W\, . . . , Wn} is also defined, with Wi associated with Q t . As a result, for k > fc , if Q{k) = Qi then 
W(k) = Wi. Therefore, D u M i: N h D bi , M bi , N bi , J b i are uniquely defined by {Qi, Wi}. 



8 



With these choices stated above, the state of the following system is identical to that of (11) for the same initial 
conditions: 



e(k + 1) = J b e(k)e(k) + B b e( k )e{k) 1 k > k 



(13) 



where 9 : Z+ — > {1, . . . , N} is the switching process that is governed by the underlying Markov chain Q(k). The 
reason for the qualifier k > ko is that weights are not limited to the set W before ko, so technically the matrices 
Jb9(k) an d -Bhe(fe) are uniquely determined by the Markov chain only for k > ko. The error dynamics (13) is a 
Markov jump linear system (MJLS) [19]. To proceed with the analysis of the mean square convergence of (13), we 
need some terminology. 



7 :=E[e(fc)], 
:=E[e(A)], 



T := E[e(jfe)e T (A)] ) 
Q(fc) := E[e(fc)e T (fc)]. 



(14) 
(15) 



Furthermore, for a set of matrices Xi £ R^ lX ^ 2 , £ 
matrix diag[Xj] = diag{-Xi, . . . .Xn} and 



i,j = 1,...,N, denote the t\N x £ 2 N block diagonal 



[Yij] ■= 



<N1 



IN 



Y, 



NN 



l Nx( 2 N 



Now, define the matrices 



Ji 
Jbi 



= (M i + D i )- 1 (N i + D i )£M nxn , 

= (Mbi + D bi )~ 1 (N bi + D bi ) £ w ibXnb , 

= Ji®Ji£ R n " xn \ F bl := J bl <E>J bl £l 



(16) 



where ® denotes the Kronecker product. Furthermore, define the matrices 



V 
T> b 
C b 



(V T <E> I) diag[Fi] = [ Pjl Fj] el 
{T T ® I)diag[F bi ] = [ Pjl F bj ] £ 
{V T ® T)diag[J bl ] - \pjiJbj] £ R N n b xNn^ 



sNnfxNnf 



(17) 
(18) 



where I is an identity matrix of appropriate dimension. Recall that V is the transition probability matrix of the 
Markov chain. 



The key to establish Theorem 2, is the following technical result and the proof is provied in the Appendix B since 
it requires introduction of considerable new terminology. 

Lemma 5 When the temporal evolution of the graph Q(k) is governed by a homogeneous ergodic Markov chain 
whose transition probability matrix V has the property that its diagonal entries are strictly positive, then p(D b ) < 1 
if and only if the union graph Q defined in (9) is connected, where V b is defined in (18) and p(-) denotes the spectral 
radius. If Q is not connected, p(V b ) = 1. 

The following definitions and terminology from [19] will be needed in the sequel. Let M. ilXi2 be the space of l\ x l 2 
real matrices. Let H 1 2 be the set of all N-sequences of real l\ x £2 matrices, so that V £ M ilX{ ' 2 means V = 
{Vi, V2, ■ ■ ■ , Vjv) where Vi £ R^ lX ^ 2 for i = 1, . . . , N. The operators ip and (p is defined to create a tall vector by 
stacking together columns from these matrices, as follows: let {Vi)j £ M. £l be the j-th column of Vi £ K^ lX ^ 2 , then 

^):=[(K)f (^] T eR Wa (19) 

ftV) := [<p(V 1 ) t ,...MVn) T ] T 6K Wl<2 . (20) 
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Similarly, the inverse function <p 1 : R^ 1 ^ 2 — > HF lXf2 is defined so that it produces an element of HI 1X2 given a 
vector in R Wl<! . 

Lemma 6 Consider the jump linear system (13) with an underlying homogeneous and ergodic Markov chain. The 
state vector e(k) of the system (13) converges in the mean square sense if and only if piVb) < 1, where V b is defined 
in (18). When mean square convergence occurs, then /x(fc) — > /x and Q(fc) — > Q 7 where 

N N 

»:=J2<1U Q:=^Q 4 , (21) 



i=i i=i 



where 



[q^...,ql] T = q:=(I-C b )-^ tR Nn >, 

(Qu ...,Qn) = Q:= r 1 ((I - V b )-^{R{q))) G H"» x ' 



and tp, R(q) are given by 



N 



1> := [iff, . . .,^ T N ] T G R Nn \ 4> ■= Y^PioBbil^i G 

i=l 

R(q) :=(iii(«) M9))er^, 

AT 

^•(g) := 2P«( B w rB wTi + Juqil T Bl + B bi7 qJ './£)) G M" 6X " 6 . 



Moreover, Q is positive semi- definite. 



Proof: It follows from Theorem 3.33, Theorem 3.9, and remark 3.5 of [19] that mean square convergence of (13) is 
equivalent to p(V b ) < 1. The expressions for the mean and correlation, as well as the fact that Q > 0, also follow 
from [19, Proposition 3.37,3.38]. The existence of the steady state distribution it (that appear in the formulas) follows 
from the ergodicity of the Markov chain. □ 



Now we are ready to prove Theorem 2. 



Proof of Theorem 2 (Sufficiency): It follows from the hypotheses and Lemma 5 that we have p(T> b ) < 1. It then 
follows from Lemma 6 that the state converges in the mean square sense. (Necessity): If the union of graph is not 
connected, we have from Lemma 5 that p{T> b ) = 1. This shows that (due to Lemma 6) convergence will not occur. □ 



5 Simulation studies 



As discussed in Section 2.1, skew and offset estimation are special cases of the problem of estimation of scalar node 
variables from relative measurements. Therefore simulations are conducted only for scalar node variable estimation. 
In all simulations, node variables are chosen arbitrarily, a single reference node is present, and the value of the its 
node variable is 0. The noise on each measurement is a normally distributed random variable. All the edge weights 
are assigned a value of unity at every time. 
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Fig. 2. The three graphs Q\, G2,Gs that comprises G. Node 1 is the reference. 
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Fig. 3. Mean and variance of the estimate of node 3's node variable as a function of time. The empirical estimate of mean 
and variance is computed from 1000 Monte Carlo experiments. In (&), the "steady-state" corresponds to the limiting standard 
deviation predicted by Lemma 6. 



5.1 Four-node network with Markovian switching 



In this scenario the nodes move in such a way that the graph Q(k) can be one of only 3 graphs shown in Figure 2. 
The graphs change according to a Markov chain whose transition probability matrix is 



0.3 0.7 
0.1 0.5 0.4 
0.5 0.5 



(22) 



Notice that none of the graphs is a connected graph, though the union of the graphs in G is connected. Also, V 
is ergodic. The mean and variance of measurement noise on every edge are chosen as and 10 -4 , respectively. 
The limiting means and variances of the estimates therefore can be computed from the predictions of Lemma 6. 
Monte-Carlo experiments are conducted to empirically estimate the mean and variance of the estimation error, by 
averaging over 1000 sample runs. Figure 3(a) and Figure 3(b) show the empirically estimated mean and variance of 
node 3's estimate of its node variable. As predicted by Theorem 2, the mean of the estimate converges to the true 
value, since the measurement noise is mean. The variance also converges to the theoretical steady state variance 
as predicted by Lemma 6. 



5.2 A 100-node network with RWP mobility model 



Here 100 nodes move in a 1000 m x 1000 m square according to the widely used Random Waypoint (RWP) mobility 
model [27]. It has been justified in the Appendix A that the graph switching process in this mobility model can be 
reasonably modeled as a (first order) Markov chain. The parameters maximum/minimum speed and pause time are 
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Fig. 4. Two graphs that occur during a simulation with 100 nodes moving according to the random waypoint mobility model. 
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Fig. 5. The estimates of two nodes in one of the numerical experiments involving the 100-node mobile network. 
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Fig. 6. Empirically estimated mean and variance of the estimation error for one of the nodes in the 100-node mobile network. 



v-min = 10 m/s, v max = 50 m/s, and t p = 0.1s. The communication range is chosen as 100 m, and a link failure 
probability of 0.1 is used. The mean and variance of the measurement noise are chosen as and 10~ 4 . Figure 4 
shows two snapshots of the network during one of the simulations. Figure 5 shows the time trace of the estimates 
of two nodes in one of the simulations. The mean and variance of the estimation error was empirically computed 
from 1000 Monte Carlo simulations. Figure 6 shows mean and variance of the estimation error for two nodes. The 
figure suggests that the estimates of the node variables converge in the mean square sense. Note that the transition 
probability matrix is not known and the large state space makes it infeasible to compute the theoretical predictions 
of limiting mean and variances that are given in Lemma 6. One purpose of these simulations is therefore to test the 
performance of the algorithm when theoretical predictions are not available. 
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6 Summary 



We analyzed a distributed algorithm for estimation of clock skew and offset of the nodes of a mobile network and 
examined its convergence properties. The algorithm allows nodes to put different weights on estimates received from 
distinct neighbors, depending on the accuracy of the corresponding relative measurements. The time variation of the 
network was modeled as a Markov chain, which makes the algorithm a jump linear system. Under the assumptions 
that the Markov chain is ergodic and the diagonal entries of its transition probability matrix are positive, the 
estimates were shown to be mean square convergent as long as the union of the graphs over time is connected. 

Expressions for the asymptotic mean and correlation are also provided by using results from jump linear systems 
from [19]. Evaluating these expressions requires summation of TV terms, where TV is the number of distinct graphs 
that can occur. In general TV is a very large number, so the utility of these expressions is limited in the general 
setting. For instance, if no restriction is placed on the motion of the nodes or edge formation, TV is the number of 
distinct graphs possible with n nodes, which is 22 n (™ -1 ). Clearly, this is a very large number unless n is extremely 
small. However, in special situations TV can be smaller, e.g., if certain nodes are restricted to move only within certain 
geographic areas. 

In time-varying systems, the rate of change is an important parameter. The assumption that Markov chain satisfies 
Pa > provides an upper bound on how fast nodes can move and the network can change (compared to the time 
required to obtain relative measurements and current estimates). This assumption was used to prove Theorem 2. 
However, it is possible that mean square convergence can be proved with weaker constraints on the speed of topology 
change. 

We have not examined the question of convergence rate. It is likely that the transition probabilities of the chain 
will play a role in the convergence rate. However, precisely characterizing of the convergence rate of the algorithm 
remains an open problem. The time to reach acceptable estimation accuracy can however be reduced by more careful 
choice of the initial condition, e.g., using the flagged initialization scheme proposed in [11]. 
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A Markovian model of topology change 

Here we examine the question of the applicability of the Markovian model of graph switching. An example in which 
the time variation of the graphs satisfies the homogeneous Markov model is a network of mobile agents whose 
motion is modeled with first order dynamics with range-determined communication. In ad-hoc networks literature 
this is referred to as the random walk mobility model [27]. Specifically, suppose the position of node u at time k, 
denoted by p u (k), is restricted to lie on the unit sphere S 2 = {x e K 3 |||x|| = 1}, and suppose the position evolution 
obeys: p u {k + 1) = f{p u {k) + A„(fc)), where A u (fc) is a stationary zero-mean white noise sequence for every u, and 
E[A u (fc)A„(fc) ] = unless u — v. The function /(■) : K 3 — > S 2 is a projection function onto the unit-sphere. In 
addition, (u, v) e £(fc) if and only if the geodesic distance between them is less than or equal to some predetermined 
value. In this case, the graph Q(k) is uniquely determined by the node positions at time k, and the prediction of 
G(k + 1) given Q(k) cannot be improved by the knowledge of the graphs observed prior to k: Q{k — 1), ... , Q(0). 
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Fig. A.l. The standard shapes of estimated conditional entropy for three different cases: (a) independence, (b) first-order 
dependence and (c) second-order dependence. If a process is a first order Markov chain, empirically computed conditional 
entropy will show a trend similar to the one in (b). 

Hence the evolution of the graph sequence satisfies the Markovian property. If in addition random communication 
failure leads to two nodes not being able to communicate even when they are in range, the Markovian property is 
retained if the communication failure is i.i.d. 

However, it is not straightforward to check if the sequence of graphs generated by the model satisfies the Markovian 
property for other mobility models. A general method of checking Markovian switching of graphs is therefore needed. 
We borrow a method that is proposed in [28] to check if a stochastic process is Markov from observations of the 
process. We first introduce some standard notation from information theory. Let X be a discrete random variable 
with sample space Q = {1, . . . , N} and probability mass function p(x) = P(X = x), where x 6 il. The entropy of X 
is defined by 

#P0 = l°g pC«0- (A.l) 

The definition of entropy is extended to a pair of random variable X, Y, where X, Y £ fi, as follows 

H(X, Y) ;= — P(*> V) log P(x, y). (A.2) 

The conditional entropy H(Y\X) is defined as 

H(Y\X) :=-J2 V) lo S P( x \y) = H ( X > Y ) - H ( X )> ( A - 3 ) 

x.y (EH 

where p(x, y) is joint probability mass function. The conditional entropy measures the conditional uncertainty about 
an event given the another event. Consider a stochastic process {Xi,X 2 , . . . }. Assuming the process is stationary, 

we denote H(single) := H(Xk) and H(double) :— H(Xk, Xk+i), for all k = 1, It is straightforward to show 

that H(double) = 2H(single) if the successive random variables Xk are i.i.d. In this case the random process is a 
zero-order Markov process. If the random variables Xk are not independent, H < H(double) < 2H. To address the 
question of whether it is (m-th order) Markov, we extend the entropy definition to multivariate random variables, 
with H (triple) := H(Xk, Xk+i, Xk+2), etc. Now, the sequence H = log(iV), H± = H(single), H 2 = H(double) — 
H(single) and H3 = H(triple) — H (double), etc, measures the conditional uncertainty for each order of dependence. 
A graphical approach is given in [28] to determine the order of dependence of a random process by plotting the 
estimates of each Hi, where i — 1,2,... and examining the shape of the curve. The estimate of each Hi can be 
calculated from observations. Figure A.l shows the standard shapes for independence, first-order dependency, and 
second-order dependency. If the process is independent, then knowing the value of Xk will not help in predicting 
Xk+\, which is seen in the flat shape of the entropy function in Figure 1(a). In contrast, the sharp drop from H\ 
to H 2 in Figure 1(b) indicates that knowing the value of Xk-i will dramatically decrease the uncertainty in the 
prediction of Xk, while the values of Xk~2, Xk-3, ■ ■ ■ will not help much. This accords with the dependence property 
of a first-order Markov chain. Similarly, Figure 1(c) indicates that the previous two variables Xk-2, Xk-i are both 
important to predict Xk ■ In this case the process is better modeled as a second order Markov chain. 

In order to conclude whether the evolution of graphs is governed by a first-order Markov chain, we adopt the method 
discussed above as follows. For a particular mobility model, we conduct a simulation and collect observations of the 
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Fig. A. 2. Empirically estimated conditional entropy for the graph process {Qk} with three nodes moving according to the 
random waypoint mobility model. 



graph sequence. Since the underlying sample space G of the stochastic process Q(k) is finite, the method described 
above is applicable. We then use the approach above to check whether the plot of Hi estimated from the collected 
observations is closer to that in Figure 1(b) than to those in Figure 1(a) or Figure 1(c). If so, we declare that it is 
reasonable to model the graph switching process as Markovian. 

As an illustrative example, we consider the widely used random waypoint (RWP) mobility model [27]. In the RWP 
model, each node is initialized to stay in its initial position for a certain period of time (so called pause time t p ). Then, 
the node picks a random destination within the region it is allowed to move and a speed that is uniformly distributed 
in [vmimVmax]. Once node reaches the new destination, it pauses again for t p before starting over. We conduct a 
simulation of the RWP model with 3 nodes, where v min , v max , t p are chosen as 10 m/s, 50 m/s and 0.1 s. The nodes 
are allowed to move in a region 10 x 10 m. Nodes' positions are initialized randomly. The sample space consists of 8 
graphs. By performing the simulation for a long time (10 4 s), we obtain a large number of observations of the process 
{G(k)}. The probability mass function is empirically estimated from the observations. For estimating conditional 
entropies, certain conditional probabilities, especially those of the type P(Q(k) = G\\Q{k — 1) = G2, Q{k — 2) = G3), 
are problematic since the relevant events may not be observed even in a very long sequence of observations. In this 
case we set the corresponding probabilities to and use OlogO = 0. The empirically estimated conditional entropies 
Hi are shown in Figure A. 2. Clearly, the shape of curve is similar to that in Figure 1(b). Therefore, we conclude 
that the graph switching process in RWP mobility can be reasonably modeled as a (first order) Markov chain. 

Note that in RWP mobility, prediction of the future node locations (and therefore the graph) based on knowledge of 
past and present may be more accurate instead of prediction based on only the present. Therefore it is quite possible 
that the graph switching is not first-order Markov. However, the results of the test above shows that a Markov model 
quite accurately captures the graph switching process with RWP mobility. 

B Proof of Lemma 5 

Recall that a non- negative matrix is called stochastic matrix if each row sum is 1. If X is a stochastic matrix, then 



p(X) = 1 [31]. 



Proposition 7 (1) If X is stochastic matrix, X ® X is also a stochastic matrix. 

(2) The matrices Ji and Fi, i = 1, . . . , N , defined in (16) are stochastic matrices. 

(3) Let 



( Fx F 2 



•• F N \ 



( Fbi Fb2 




Fi F2 



■ ■ Fn 



Fbi F 2 



FbN 



K := 



\Fi F 2 



•• F N J 



Nn 2 xNn 2 



\Fbi Fb2 



FbN J 



Nn\y,Nn\ 



There exists a permutation matrix X so that Kb is a principal sub-matrix of X T KX. 
(4) For the matrix V defined in (18), p{T>) < 1. 
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Proof: The first two statements arc straightforward to establish. The third statement follows from the fact that Ju 
is a principal submatrix of Ji. We therefore prove only the fourth statement. From (17), p(T>) = pQpjiFj]). Since 
PjiFj is a non-negative square matrix, it follows from [32, Theorem 3.2] that p(\pjiFj]) < pQHfJji-Fj ||oo])- Moreover, 
Ibji-Fjlloo = Pjill-Pj'lloo- Since Fj is a stochastic matrix, ||-Fj||oo = 1- We therefore have 

P(D) < p(MI^II°o]) - P{\pji\) = P(V T ) = PV>) = 1- □ 



Lemma 8 Let V be the transition probability matrix of an N -state ergodic Markov chain whose diagonal entries are 
positive. The V defined in (17) is irreducible if and only if the union graph Q defined in (9) is connected. 

The proof of Lemma 8 is postponed and we first prove Lemma 5. 



Proof of Lemma 5 Since the union graph Q is connected, it follows from Lemma 8 that T> is irreducible. From 
the third statement of Proposition 7, there exists a permutation matrix X, such that T>\, is a principal submatrix of 
X T VX. The spectral radius of an irreducible matrix is strictly greater than the spectral radius of any of its principal 
submatrices, which follows from Theorem 5.1 in [31]. Therefore we have 

p{V b ) < p(X T VX). 

From the fourth statement in Proposition 7 and the fact that permutation does not change eigenvalues, it follows 
that 

p{X T VX) = p(V) < 1. 

Combining these two inequalities we get that if Q is connected then pifDb) < 1. To prove necessity, we construct a 
counterexample, in particular, a trivial Markov chain with a single state: G = {Qi} (so that V = 1) where Q\ is a 
n-nodc graph without any connected edge. Then = Fbi = Jbi ® Jbi — I, which has a spectral radius of unity. 
This completes the proof of the lemma. □ 



The proof of Lemma 8 needs the following definition and results. All matrices are non-negative hereafter; so we will 
explicitly say "non-negative" only when we have to stress it. For matrices X\, X 2 of same dimension, we say X\ and 
X 2 are congruent, and write X\ = X 2 , if the following holds: {Xi) ^ if and only if {X 2 ) i:j ^ 0. We also write 

X 1 h X 2 if the following condition is satisfied: ^ if (X 2 ) 4J / 0. The directed graph G(X) = {1/{X), £{X)) 

corresponding to a square matrix X G M. nxn is a graph defined on n nodes in which (u,v) G "E{X) if and only if 
X u ,v 7^ 0. A directed graph G is called strongly connected if for each pair of nodes u and v, there is a sequence of 
directed edges in £ leading from utow [33] . If G\ is a subgraph of G 2 , meaning that G 2 contains all the nodes and 
edges of G\, we write G\ C G 2 or G 2 DGi. Two directed graphs Gi and G 2 are called congruent if their adjacency 
matrices are congruent. We denote by Adj(G) the adjacency matrix of the graph G. For a n x n square matrix X, 
we write Adj(X) to denote Adj(G(X)), which is an n x n matrix with i, j-th entry equal to 1 if and only if X v > 0, 
and otherwise. Essentially, the matrix Adj(X) replaces the positive entries of X by 1 and leaves the entries 
untouched. 

The following statements for non-negative matrices can be verified in a straightforward manner. All the matrices are 
of the same dimension. 

Proposition 9 (1) X = Adj(X). 

(2) G(Xi) = G{X 2 ) if and only if X l = X 2 . 

(3) G(Xi) D G{X 2 ) ifX 1 hX 2 . 

(4) G(E-^) = uf =1 G(^). 
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Proposition 10 The graph G(X) is strongly connected if and only if X is irreducible. IfG(X) is strongly connected, 
then G(X ® X) is also strongly connected and thus X ® X is irreducible. 

The first statement of the proposition is well-known [33, pp.671]. The second statement follows from the first in a 
straightforward manner. 

Now we define the Cartesian product of two directed graphs G\ ~ (l^i, £1) and G 2 = (1^2, £2), which is denoted by 
G\UG 2 . The Cartesian product has the vertex set equal to V\ x 1^, so that nodes in the product are denoted by 
the pair (u, v), with u G l^i and s£ ^2, which is not to be confused with an edge. In order to prevent confusion, we 
will denote an edge from u to v in the sequel by u — > v. The edge set of the Cartesian product is characterized by the 
following property: there is an edge (u\, vi) — > (U2, v 2 ) in G\OG 2 if either u\ = u 2 and v\ — > v 2 G £2 or v\ — v 2 and 

u\ — V u 2 € E\. Cartesian products of undirected graphs Q\ and Q 2 are similarly defined, except that the resulting 
product graph is also undirected. The following properties will be useful in future. 

Proposition 11 (1) If G\ and G 2 are strongly connected, so is G\UG 2 . 
(2) If Adj(Xi) and Adj(X 2 ) are symmetric, then, 

Adj(G(X 1 )UG(X 2 )) = Adj(X x ) ® I + I ® Adj(X 2 ). (B.l) 



Proof of Proposition 11 The first statement is from [34, Table 2]. To prove the second statement, we intro- 
duce the notation G(X), which is the undirected graph corresponding to a symmetric matrix X. It follows that 
Adj(G(X x )UG(X 2 )) = Adj(G(Adj(X 1 ))DG(Adj(X 2 ))). From [35, Section 2.3], Adj(G(Adj(X 1 ))UG(Adj(X 2 ))) = 
Adj(Xx) ® I + I O Adj(X 2 ). Thus, we prove (B.l). □ 

The following results will be useful in the proof of Lemma 8. 

Proposition 12 If the union graph Q — li^Qi is connected, then L)fL 1 G(F i ) is strongly connected. 



Proof of Proposition 12 Recall that Fi = J, eg) J i: where Ji — (Mi + Di) 1 (N i + Di). Since Mi + Di, and therefore 
its inverse, is a diagonal matrix with positive entries, Ji = (Ni + Di). By property 2 of Proposition 9, 

N / N \ 

^iG(Fi) S G(J2 Fi) — G i ^{(iV, + D t ) ® (Ni + A)} (B.2) 

i=l \i=l J 

We also have 

{Ni + ® {Ni + D^ ±Di®Ni + Ni®Di (B.3) 
by dropping two terms in the expansion using their non- negativity. Since Di = /, we get 

(Ni + Di) ®(Ni + Di)yi®Ni + Ni®I 

N / N \ / N \ 

=>5Z((JV i + A)®(iV i + A)) hi® \J2 Ni ) + \J2 Ni ) < E>L 

i=\ \i=\ ) \i=l ) 

Using property 3 in Proposition 9, we have 

(N N \ N N 

(J2Ni)®I + I®(J2 Ni) £* G(J2 NipG(J2 Ni). (B.4) 
i=l i=\ ) i=l i=l 

where the congruence follows from the second statement in Proposition 11. Recall the structure of Ni, it follows that 
(i) Adj(Qi) = Adj(Ni), and (ii)Adj(Ni) is symmetric. As a result, Adj(^2,f =1 Ni) is also symmetric. Since Q — U^C/i 
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is a connected undirected graph, its adjacency matrix is irreducible. This means Adj(\jf =1 Qi) = J2i=i Adj(Qi) = 
Adj (Ni) is irreducible. Due to the first statement in Proposition 11, G(^ i=1 Afj)DG(^ i=1 TVj) is strongly 
connected. The result of this proposition now follows from (B.4). □ 

The Kronecker product of two graphs G\ = (1^, £1) and G 2 = (l^, £2), denoted by G\ G 2 , has the vertex set 
equal to V\ x V2 and an edge set that is characterized by the following property: there is an edge (u\, v\) — > (^2,^2) 
in Gi G2 if and only if u\ — > u 2 £ £1 and «i — >• u 2 £ £2 [34]. Note that the Cartesian and Kronecker products 
GiDG 2 and Gi G 2 have the same vertex sets but distinct edge sets. We have the following property of Kronecker 
product of graphs from [36] : 

Adj{G x G 2 ) = Ad 3 {Gi) Adj(G 2 ). (B.5) 

Adjacency matrices of both Cartesian and Kronecker products of two graphs are related to the adjacency matrices 
of the individual graphs through the matrix Kronecker product, cf. (B.l) and (B.5). 

Now we are ready to prove the Lemma 8. 

Proof of Lemma 8 (Connectivity => irreducibility): Here we have to prove that if the union graph Q is connected 
then the matrix V is irreducible. We will prove it by showing that the directed graph QiV) is strongly connected. 
Let Zj and Sj be the diagonal and off-diagonal parts of Fj. Since Zi is a non- negative matrix with positive diagonal, 
we get 

V = \pjiZj] + \pjiSj] = \pjil n 2] + [p^Sj] 

hP T ®I + diag\paSi] (B.6) 

where we have used the fact that \pjil n 2 ] = ~P T / and dropped the off-diagonal blocks of \pjiSj]. Therefore 

g(V) D G(T T 1) \J G(diag[Si]) 

= {G(^ T )0G(/)}|jG(^a 3 [^]) 

where the equality follows from the property (B.5) of Kronecker product of graphs. We will now show that the 
directed graph {G(V T ) G(I)} \J G(diag[Si\) is strongly connected, which proves that G(V) is as well. 

First notice that there are Nn 2 nodes in the graph G(V), so are V T / and diag[Si\. It is convenient to imagine 
them as N x N block matrices, with each block being of dimension n 2 x n 2 . Therefore, we introduce a useful new 
notation. Let's index a node by the pair (p i} d K ), which is the ((i — 1)N + /t)-th node in G(V), where i — 1, . . . , N and 
k = 1, . . . , n 2 . This notation is similarly suitable for V T ®I and diag[Si\. To prove that {G(V T )®G{I)} (J G{diag[Si\) 
is strongly connected, we need to show the following 

There is a path from an arbitrary node(p i} d K )to another arbitrary node{pj 1 d u ) 

in the graph {G{V T ) ® 6(1)} (J G(diag[Si\). (B.7) 

The following properties will be used to construct a proof of (B.7): 

(1) si: There exists a path from (pi,d K ) to (pj, d K ) in G(V T /) for alH, j = 1, . . . , N and k = 1, . . . , n 2 . 

(2) s2: If d K — > du is an edge in G(Se), then (pe, d K ) — > (pe, dh) is an edge in Q(diag[Si\). 

The first statement is proved as follows. Since the Markov chain is ergodic, V - and therefore V T - is irreducible, 
which means G(V T ) is strongly connected. Thus, given arbitrary nodes p and q in Q(T T ), there is a path connecting 
them in Q(V T ). Call this path p, Ui,u 2 , . . . 7 u„ l7 q. Since the edge d K — >• d K G Q(I) exists for every k, it now follows 
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from the definition of Kronecker product of graphs that the path (p, d K ), (u\,d K ), (u 2 ,d K ), . . . , (u m , d K ), (q, d K ) exists 
in G(V T ) ® G(I) for every n = 1, . . . , n 2 . The statement si is now proved upon replacing p and q by pi and pj. 
The statement s2 is true because of the structure of the matrix diag[Si] and the node indexing scheme described 
immediately before (B.7). 

From Proposition 12, we have that U^ 1 G(F i ) is connected. Since Si is the off-diagonal part of Fi, U^ 1 G(S'i) is 
connected as well. Therefore, there is a path from an arbitrary node d K to another arbitrary node d v in U^L 1 G(Si), 
for all k, v in {1, . . . ,n 2 }. To prove the statement (B.7), pick such a path from the node d K to the node d v in 
^iLiG(Si), where each edge in the path may lie in any of the graphs {G(S'i)}^ 1 . For the sake of concreteness and 
compactness, let us consider a path of length two, consisting of the two edges d K — > dh and dh — > d v , which belong 
to the graphs, say, G(Se) and G(S m ), respectively. From si we have proved above, we know that there is a path 
from the node (pi,d K ) to the node (pe, d K ) in the graph Q(V T <8> I), call this path path[(pi, d K ) ~> (pe, d K )]. From s2, 
we have that the edge (pe, d K ) — > (pe, dh) exists in the graph Q(diag[Si\) due to the existence of the edge d K — > dh 
in Q(Si). Thus, we have the path from (pi, d K ) to (pe, dh) in the combined graph {G(V T ) ® G(I)} [J G(diag[Si\) by 
joining the path path[{pi, d K ) ^ (pe, d K )] with the edge {pi, d K ) — > {pi, dh). Using this idea repeatedly, we construct 
a path from (pi, d K ) to (pj,d v ) in {G(P T ) (8> G(I)} {J G(diag[Si\) as follows: 

path[{ Pi , d K ) ~> (p t , d K )], in G(V T ) ® G(I) 

(pe, d K ) ->• (p^, dfc), e G(dia5[iS , j]) 

potft[(p/, d h ) ~» (p m , d fc )], in G(P T ) ® G(/) 

(Pm,d h ) ->(p m ,d v ), € G^m^fS'i]) 

pa^[O m , d„) ~» (pj-, ^)], in G{T T ) ® G(/), 

where each path[-\ exists due to the property si established above, and each edge exists due to the property s2 
as well as with the assumed existence of the edges d K — > dh and dh — > d v in the union graph. This argument can 
be extended to a path of any length between d K and d v in the union graph U^ =1 G(Si). Thus, there is a path from 
(Pi,d K ) to (pj,d v ) in {G(V T ) ® G(I)} \J G(diag[Si\), which proves sufficiency. 

(Not connected => reducible): A simple counterexample proves necessity. Construct a trivial Markov chain with a 
single state: G = {<7i} (so that V = 1) where Gi is an n-node graph without a single edge. Then V = F\ = J\®J\ = /, 
which is reducible. □ 
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